Fundamental Experimental Research in Machine LearningThomas

نویسنده

  • Thomas G. Dietterich
چکیده

Fundamental research in machine learning is inherently empirical, because the performance of machine learning algorithms is determined by how well their underlying assumptions match the structure of the world. Hence, no amount of mathematical analysis can determine whether a machine learning algorithm will work well. Experimental studies are required. To understand this point, consider the well-studied problem of supervised learning from examples. This problem is usually stated as follows. An example x i is an n-tuple drawn from some set X according to some xed, unknown probability distribution D. An unknown function f is applied to each example to produce a label y i = f(x i). The labels may be either real-valued quantities (in which case the problem is referred to as a regression problem) or discrete symbols (in which case the problem is referred to as a classiication problem). The goal of machine learning algorithms is to construct an approximation h to the unknown function f such that with high probability, a new example x 2 X drawn according to D will be labeled correctly: h(x) = f(x). For example, consider the problem of diagnosing heart disease. The examples consist of features describing the patient, such as age, sex, whether the patient smokes, blood pressure, results of various laboratory tests, and so forth. The label indicates whether the patient was diagnosed with heart disease. The task of the learning algorithm is to learn a decision-making procedure that will make correct diagnoses for future patients. Learning algorithms work by searching some space of hypotheses, H, for the hypothesis h that is \best" in some sense. Two fundamental questions of machine learning research are (a) what are good hypothesis spaces to search and (b) what deenitions of \best" should be used? For example, a very popular hypothesis space H is the space of decision trees and the deenition of \best" is the hypothesis that minimizes the so-called pessimistic error estimate (Quinlan, 1993). It can be proved that if all unknown functions f are equally likely, then all learning algorithms will have identical performance, regardless of which hypothesis space H they search and which deenition of \best" they employ (Wolpert, 1996; Schaaer, 1994). These so-called \no free lunch" theorems follow from the simple observation that the only information a learning algorithm has is the training examples. And the training examples do not provide any information about the labels of new points …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Reliability in a Dynamic Cellular Manufacturing System: A Comprehensive Approach to a Cell Layout Problem

The fundamental function of a cellular manufacturing system (CMS) is based on definition and recognition of a type of similarity among parts that should be produced in a planning period. Cell formation (CF) and cell layout design are two important steps in implementation of the CMS. This paper represents a new nonlinear mathematical programming model for dynamic cell formation that employs the ...

متن کامل

Exploring Gene Signatures in Different Molecular Subtypes of Gastric Cancer (MSS/ TP53+, MSS/TP53-): A Network-based and Machine Learning Approach

Gastric cancer (GC) is one of the leading causes of cancer mortality, worldwide. Molecular understanding of GC’s different subtypes is still dismal and it is necessary to develop new subtype-specific diagnostic and therapeutic approaches. Therefore developing comprehensive research in this area is demanding to have a deeper insight into molecular processes, underlying these subtypes. In this st...

متن کامل

کمانش پوسته‌های استوانه‌ای با گشودگی شبه بیضوی تحت فشار محوری

Understanding how a cutout influences the load bearing capacity and buckling behavior of cylindrical shells is fundamental in the design of structural components used in automobiles, aircrafts, and marine structures. In this article, simulation and analysis of steel cylindrical shells with various lengths, include quasi elliptical cutout, subjected to axial compression were systematically carri...

متن کامل

Phase Transitions in Machine Learning

Phase transitions typically occur in combinatorial computational problems and have important consequences, especially with the current spread of statistical relational learning and of sequence learning methodologies. In Phase Transitions in Machine Learning the authors begin by describing in detail this phenomenon and the extensive experimental investigation that supports its presence. They the...

متن کامل

An Efficient Explanation of Individual Classifications using Game Theory

We present a general method for explaining individual predictions of classification models. The method is based on fundamental concepts from coalitional game theory and predictions are explained with contributions of individual feature values. We overcome the method’s initial exponential time complexity with a sampling-based approximation. In the experimental part of the paper we use the develo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997